Semi-supervised Text Categorization Using Recursive K-means Clustering
نویسندگان
چکیده
In this paper, we present a semi-supervised learning algorithm for classification of text documents. A method of labeling unlabeled text documents is presented. The presented method is based on the principle of divide and conquer strategy. It uses recursive K-means algorithm for partitioning both labeled and unlabeled data collection. The K-means algorithm is applied recursively on each partition till a desired level partition is achieved such that each partition contains labeled documents of a single class. Once the desired clusters are obtained, the respective cluster centroids are considered as representatives of the clusters and the nearest neighbor rule is used for classifying an unknown text document. Series of experiments have been conducted to bring out the superiority of the proposed model over other recent state of the art models on 20Newsgroups dataset.
منابع مشابه
Text Categorization using the Semi-Supervised Fuzzy c-Means Algorithm
Text Categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. For the past few years, TC has become very important essentially in the Information Retrieval area, where information needs have tremendously increased with the rapid growth of textual information sources such as the Internet. In this paper, we compare , for text categoriz...
متن کاملA Novel Multi Label Learning Based on Clustering Integrated Ensemble Classifier Chain Micro Prediction Models
Most of the real world problems are concerned with assignment of multiple target labels to the instances. The proposed model aims to increase the accuracy by incorporating supervised and semi supervised learning. K Means clustering is employed which creates K clusters based on the initialization of cluster centroids. Datasets are clustered based on its distribution in the Euclidean space. Clust...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملIterative Double Clustering for Unsupervised and Semi-Supervised Learning
We present a powerful meta-clustering technique called Iterative Double Clustering (IDC). The IDC method is a natural extension of the recent Double Clustering (DC) method of Slonim and Tishby that exhibited impressive performance on text categorization tasks [12]. Using synthetically generated data we empirically find that whenever the DC procedure is successful in recovering some of the struc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016